Systematic literature review

Vincent Bagilet https://www.sipa.columbia.edu/experience-sipa/sipa-profiles/vincent-bagilet (Columbia University)https://www.columbia.edu/ , Léo Zabrocki https://www.parisschoolofeconomics.eu/en/ (Paris School of Economics)https://www.parisschoolofeconomics.eu/en/
2020-12-09

Purpose of the document

In the present document, we aim to conduct a systematic review of the literature on short term health effects of air pollution. The objective is two fold: - Retrieve effect sizes and confidence intervals in order to compute power, type M and type S error in the literature - Get a sense of the proportion of papers in this literature discussing power and missing data issues.

Motivation

In this section, we discuss the importance of such an analysis.

Power analysis

In this section, we implement robustness tests in order to compute the power, type M and type S error in the studied articles. We look at what would be the power, type M and type S error if the true effect was a fraction of the measured effect. We retrieved estimates and confidence intervals of articles in the literature of interest in another document. Before looking into the power analysis itself, we look at the characteristics of the articles considered.

Articles characteristics

Full set of articles

We retrieved the articles using the following query:

'TITLE(("air pollution" OR "air quality" OR "particulate matter" OR ozone OR "nitrogen dioxide" OR "sulfur dioxide") AND ("emergency" OR "mortality") AND NOT ("long term" or "long-term")) AND ("particulate matter" OR ozone OR "nitrogen dioxide" OR "sulfur dioxide")'

This query returns the following articles:

Based on the abstracts, we can briefly explore the main themes of the articles.

Abstracts with effects and confidence intervals

Out of all articles returned by the query, 700 display confidence intervals. “CI”, “confidence interval”, etc. The list of such articles is as follows:

In these articles, we retrieve valid effects and confidence intervals in the following proportions:1

Table 1: Number of articles for which at least one effect is retrieved (out of those containing the phrase CI)
Effect retreived Number of articles Proportion
Yes 592 0.8457143
No 108 0.1542857

This corresponds to 1858 valid effects and associated confidence intervals.

Analysis

To do so, we use the package retrodesign and To use the retro_design() function, we first need to compute the standard error of the estimate. Probably due to rounding effect, we often do not get the same value for the standard error if we compute it using the upper or the lower bound of the CI. Thus, we average across the two values obtained.

Results and graphs

Then, we quickly explore the results. First, we compute the average and median power, type M and type S errors.

prop_true_effect mean_power mean_typeM mean_typeS median_power median_typeM median_typeS
0.50 0.6629577 1.579196 0.0054233 0.7556944 1.156594 2.6e-06
0.67 0.7570693 1.345587 0.0029559 0.9445702 1.033339 0.0e+00
0.75 0.7937769 1.277935 0.0023906 0.9782421 1.013360 0.0e+00

Then, we look at the distribution of power, type M and type S error across simulation and for different size of true effect.

Then, we look at the relation between power, type M and type S error and true effect size.

We can also look at how power, type M and type S error evolved with publication date.

Analysis of papers dealing with power and missingness issues

Retreiving the full texts

We then download the full texts (using the ft_get functions). The full texts are stored in an xml format in the cache directory. Note that due to my IP address being located outside of Columbia, I cannot access the texts from Scopus.

Once we have downloaded all the files, we can put them into a table format, before analyzing them.

Analysis

To analyse the texts, we first start by simply exploring the proportion of articles mentioning the words “missing” and “power”.


  1. Note that a bunch of abstracts contain the phrase “CI” without actually displaying effects and confidence intervals.↩︎